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A system for automatically creating databases containing industry, service, product and 
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pages from HTML. XML or SGML encoded web pages posted on computer networks such as 
the Internet or Intranets. The web pages containing HTML. XML or SGML encoded CCG-data, 
database update controls and web browser display controls are created and modified by using 
simple text editors, HTML. XML or SGML editors or purpose built editors. The CCG databases 
may be searched for references (URLs) to web pages by use of enquiries which reference one 
or more of the items of the CCG-data. Alternatively, enquiries referencing the CCG-data in the 
databases may supply contact data without web page references. Data duplication and 
coordination is reduced by including In the web page CCG-data display controls which are 
used by web browsers to format for display the same data that is used to automatically update 
the databases. 
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TITLE: NETWORK BASED CLASSIFIED INFORMATION SYSTEMS 



HELD OF INVENTION 

This invention relates to network based dassrftad information systems, to methods of 
5 automatically building searchable databases of classified information derived from web pages 
posted on a network, and, to web pages for use in such systems and methods. 

The information systems and databases of most relevance to this invention are those which 
include classified product and service catalogues similar to the YeBow Pages telephone books 
10 contact indexes similar to the White Pages telephone books, and/or subject indexes similar to 
Library catalogues. Such information systems and databases typically include sets of 
associated classification, contact and/or geographic items of information. For convenience, 
classification, contact and/or geographic information will be hereinafter caDed CCG-data. 

15 The networks with which this invention is concerned are the worldwide public 
computer/communications network commonly known as the Internet and private networks - 
sometimes caDed intranets - which aDow common access to markup documents on computers 
connected to the network. Markup documents are text files prepared using various markup 
languages such as HyperText Markup Language (HTML) and Extensible Markup Language 

20 (XML) which are implementations (or dialects) of the Standard Generalised Markup Language 
(SGML). The system of accessible fifes on the Internet is called the World Wide Web (WWW) 
and the markup documents themselves are commonly called "web pages'. A web page is said 
to be 'posted" on a network whan ft is stored on computer-readable media of a host network 
computer as a file which is genera Sy accessible to network users. A web page is transported 
25 from the host computer to a requesting computer through intermediate network computers as 
a computer-readable signal embodied tn a carrier wave. Though this invention is not limned to 
Internet based information systems, these terms are used for convenience. 

BACKGROUND TO THE INVEWFIOM 
3D It has been estimated that there are about 100 mfllion wsb pages on the Internet and that the 
number is doubling every two yearn atony of these pages include information concerning 
commercially offered goods and servjosa and often include contact details. But the difficulty of 
locating such information is increasing faster than the growth in the number of web pages. 

35 To assist network users locate web pages of interest, certain network service providers create 
indexes (or databases) of the contents of web pages posted (stored on computer readable 
media so as to be generally accessSble) on the network and provide 'search engines' to use 
the indexes. These indexes are often created automatically by the use of 'web crawlers" which 
(i) interrogate computer after computer on the network to locate successive web pages and fp) 

40 index the words in each web page encountered against the network address (eg Internet 
Protocol Address or FA) and fiSng system path or universal resource locator (URL) at which 
the web page is accessible. Hereinafter the terms URL and URI (Uniform Resource Identifier) 
are taken to be identical in meaning and to signify network addresses and fffing system paths. 
Usually, the indexes consist of a 1st of unique words with each word having an associated Ost 

45 of URLs of the web pages whsrem the word was found to occur during interrogation. The URL 
serves as a 'hyperlink' which, if selscted by a user/searcher, results in the associated web 
page being automatically transrnfiied from the computer where it is posted on the network to 
the user/searcher's computer whoa Q may be displayed or otherwise processed. The sending 
and receiving of files in this way is greatly assisted by user interface programs called Veb 

50 browsers' (or more simply, browsers! such as Netscape and Microsoft Internet Explorer. 
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The search for web pages of interest using search engines leaves much to be desired: 

• simple searches (those using a few keywords in simple combinations) often yield far too 
many web page references (URLs) to permit them to be interrogated one-by-cne. 

5 • complex searches (those using many keywords and/or complex Boolean expressions) 
require considerable expertise to undertake, 

• even using optimum search criteria, many irrelevant web pages are referenced because of 
inconsistent use of terminology by those who author the original web pages. 

• even using optimum search criteria, many relevant pages are missed, again because of 
1 0 inconsistent use of terminology by web page authors, and 

• because items of information included in the body of web pages cannot be •understood' or 
associated in useful ways by web crawlers; that is recognised as, say. a surname, a street 
name, a geographic locality, or type of goods or services and. say. a surname strongly 
associated with a street name, a geographic locality, or a type of goods or service. 

15 The result is that information provided by search engines from databases which are 
automaticafly compiled using web crawlers is a very poor equivalent of the common Yellow 
Pages and White Pages directories which serve the telephone industry (though these 
directories are not, of course, automatically compiled from web pages). 

20 In an attempt to improve the usefulness of automatically compiled network databases, some 
search engine providers make use of information contained in URLs, such as the country code 
and top level domain name codes such as 'com*. 'edu\ 'net* and 'org 4 which is sometimes used 
to signify the subject matter of web pages, ft has been proposed to add more content 
classifying codes to URLs (eg. "cherrf to signify chemical subject matter) to allow specialised 

25 databases - national commercial, chemical, etc - to be generated. However, this proposal 
has serious drawbacks: 

• URLs are Internet addresses and it is in principle undesirable to confuse the address 
function of a URL with that of representing a list of web page classifications or contact 
details. 

30 • A URL is an inappropriate container of multiple web page classification codes and contact 
details because the length of the URL would cause it to become unwieldy as an Internet 
address. 

• Including in a URL classification codes drawn from a list of thousands of codes would 
compromise the mnemonic quality of Internet addresses such as \wrw. yellowpages.com'. 

35 • There is substantial overlap in the subject matter contained in web pages having the 
various top level domain name codes. 

• There is no consensus on. or standard for, content classification codes in URLs. 

Another proposal to add content classification data to web pages has arisen from the wish to 
40 identify pages containing material that may be offensive to some viewers, or should not be 
accessed by minors. The Platform for Internet Content Selection (PICS) (see 
httpirwww.w3.oroypubyvmW/PK^ and other documents at www.w3.org) is a web page 
ratings standard similar in principle to the ratings systems for motion pictures. This system 
allows page authors to TntemaOy* self dassify their pages through use of the "<meta..->* 
45 HTML element Attemativery. 'extemaT PICS ratings of web pages may be obtained from 
ratings service providers accessed each tone a URL ts selected. In practice, the ratings service 
providers have adopted very fimited range of web page classifications. For example. Ararat 
Software's Commercial Rating System (see http^/www.araratcom.ratings/araratlO.html) 
provides just 5 categories of web page content commercial content, technicaVcustomer 
50 support, ordering information, downloading information and contact information. In other 



examples. CybeiPaUol (h^W.microsys.com/pics/pics.msi.hlm) provides 16 categories 
1« Advisory Council (httpr/W.rsaco^q.htmO provWes 4 

V^'rJT ^^•safe^rf.com/ssplan.hlm) providS 11 categories and 

5 categones. None of the categories provide classification of web pages by indusSy ZZcs 
product orsubjsct ^ sufficient specificity to be useful when searching for web^eT 

u^'h^ !°T ^ 10 prevent web from displaying web pages 

u^beforpar^lar^of^b browser users. Such rating systems ar 'no\ intend to 

1 0 SL ' Ut r ,ated 088800 ° ? YaBow or «»» P^es fike databases from weV pages 

10 and are unsutebte for that purpose because they can not represent contact EJTfEET 
*e ratmgs^ datomay only be encoded m the <me.a...> element in the <h-d»i I^SX 
document drastically limiting the type end usefulness of the data that can be encoded. 

15 ^'Tn^ daS8 J? 0 C ° fl!ent 0f ««* *» ™<*» Content Framewo*" 

h^kJT h^cf.research.apple.com/mcf.htmn. requires the content of web pages to be 
dassrffed and I toe dassifeation data to be held in a separate non-HTML data file with a MJME 

1^ ?°" n9 ^ 01 nmWUL encoded documen,s ^ *•«*•» the content of 
HTML encoded documents * a technical and economic barrier to the adoption by search 

20 ISZZZtZ! ° f t Sr po " t Tha mF propo8al ■ *u. entirely unsuited to the elmSed 
,° f YeD ° W ° r WMe p398a *» Abases from HTML encoded web pages (MIME Wee 
tex^nl) because data stored according ,0 the MCF proposal is not storeo 

weo pages. 

25 ri e v!rt^l?r /ne f ( 2-" r, . VCart - (see Tne Etect ™* B^ess Card' Version 
J! ^ Specrficaton. Sept 18. 1936 or ftpV/ds-intemicnetfintemet-drafte/draft- 
«tf-as^nime-vcard-01.tot) uses non-HTML data file (MIME Content Types of tea/plain' or 
fte non-standard "textrX-vCanT) containing contact information equivalent to an extended 

T^P^Z^Z^l ea " ? eaChaR9ed °" 3 nehvo,k us5n 9 Sim P te Mail Transfer Protocol 
30 «ST r!Z 7 T b ® a630C5Sted ^ a web P a Se by use of a URL in the web page 
22<2r\/ o^J nft,mafl0n (e 9 ^ href=^^W.thif«.cofr^ard. V cfS£y 
r^i fer^ 0 ! 2 - 1 J* data fite ,ormat (P ubli3hed 18 Se P^r 1936) 

£Lmm J£ t * ^*™ * ^ ° f ""^ Nation. The vCard specification 
> P0SSa>te ' **** *° uW te «»*istent mapping of vCarfproperty 
35 ST£ ™ ^ a4Wwto names vCard proper^ na^ne TITLE* Ssto 

35 HTML <«n pu t name= -ftle^. The hMon is to facilitate the transfer of vCard data intTweb 
page input forms by pasting from a cEp&oani or by dragging from other computer applications. 
2!^? * P nS„ B UnSUa8d to 818 automated ctaation of Yellow or White pages like 
2£E? °" T .^ ded ^ «■« «»ta stored according to ft T VCard 

^ proposal is not stored m HTML encoded web pages. 

vSi?? °" £ d !. SS ^ d infWmatol " ^P 8 ^ documents (such as Meta Content files or 
vCards) has the disadvantage that torn* fe necessarily much duplication of data and 

L f r 0dif " a60n L be{W8en 9,6 and the P vSp ag « n£ 

45 to H" 6 "** ** ™ 8 using an HTML compfiant browser 

n!! r rt m worth catSng up the associated file or vice versa. Also to allow 

d^LS in IT 8 ! 0 1 **** "* PaS * textual intonation would have tote 
dupfcated , n the separate documeni vCaitis in particular do not provide this functionality 
^ ??T«* **« "^^ML <3ocuments such as vCar* no STs to 

ou font. s«e. colour of the text and other elements of the document are of great importance. The 
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restriction of address data in a vCard to untagged ordinafly organised fields is inflexible. For 
example, multiple instances of extended parts of the address are not possible. Also 
components of names, addresses and telephone numbers and so forth are insufficiency 
identified. 

5 

The Online Computer Library Center Inc (OCLC. Oubiin. Ohio, USA) proposal, known as the 
•Oublin Core*, proposes to classifying scholarly web pages by subject (topic of the won\ or 
keywords that describe the content of the work), title, author, publisher, other agent, date, 
object type (genre of the object such as home page, novel poem etc), form, identifier, source! 

10 language. relationship and coverage (spatial and temporal) (see 
http://www.odc.otg:5W^ and other documents at vwvw_ocJc.org). This 

proposal does not include industry, service, product or subject classifications, it also! does not 
include contact details. Names such as that of the author are not specified in sufficient detail to 
avoid ambiguities such as which is the author's first and last names. The proposal specifies 

15 that the details are encoded using the <meta...> element in the <head> of web pages. The 
proposal is unsuited to the automated creation of Yellow or White pages l&e databases from 
web pages because the proposal does not provide for classification of web pages and does 
not provide adequate contact details. Further, the use of keywords for describing the content 
of the work adds very Jittfa to the effectiveness of Indexing of web pages since the web pages 

20 are usually indexed on every word of their content and most often the key words would simply 
be a duplication of words already contained in the document 

It has also been proposed to use the Dewey Decimal System (see 
http7/orc.rsch.oclc.org:6109/evaLdc.html and http^/orcnxh.octe.org:6109/bintro.htmQ to rank 

25 electronic documents against a Dewey Decimal subject classification. The proposal suggests 
automatically assigning Dewey Decimal subject classification codes to documents during 
automated indexing and cataloguing but does not specify the exact nature of the assignment 
although It is implied that the codes are stored separately from the documents. The proposal 
admits that such automated classification rs less satisfactory than human classification. The 

30 proposal is unsu'rted to the automated creation of Yellow or White pages like databases from 
web pages because the accuracy of classification is inadequate, does not provide for inclusion 
of industry, service or product dassfccatons and does not provide for inclusion of contact 
details. Deriving a subject classification code from an analysis of every word and phrase in a 
web page is computationally expensive. 

35 

The HTML 3.0 standard (see page 23 of the www.w3.org document °draft4etf-html-6pecv3- 
OO.txT) provides 'class" as an attribute of almost atl HTML °<body>° elements. The 'class 0 
attribute Is intended to be used with style sheets. Style sheets provide a means by which" the 
display of HTML documents may be altered to suit the needs of different classes of browser 

40 users. For example. <div dass^appendix^ could be used to define a division that acts as an 
appendix, <h2 cJass=°sec4ion 0 > could be used to define a level 2 header that acts as a section 
header, although, of course, any string off character could be defined for those purposes. The 
*dass° attribute, although never having been suggested for holding goods and services 
classifications, is not suited for such a use as it is. in any case, undesirable to confuse the style 

45 sheet function of the "class" attribute. 

The HTML 3.0 and earfer standards provided the HTML elements *<person>" and °<address>" 
but do not specify the form of the content or method of validating the content of those 
elements. A person's name may be writer* as first name followed by last name or last name 
50 followed by first name. Similarly, different conventions exist for writing addresses. Similar 
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ambiguities arise h the Bl defined format of the HTML elements "< P erson>- and "<address>". 
As such they are of little use in the automatic compilation of searchable databases. 

The XML language (see: http://textua%.«)rn/sgrnl-erbM/D-xml html) was developed to extend 
5 HTML so that software vendors can add new elements and new element attributes to HTML 
which are not specifically defined in any HTML standard. The intention is to ensure that all new 
elements and attributes could be parsed by aD XML parsers even if (he new elements held no 
significance for any particular XML parser. However, fike HTML. XML does not provide a 
standard for the representation of industry, service, product or subject classification, contact or 
10 geographic location detafls within an web page. 

Of course, many useful databases of the Yellow Pages or White Pages type are made 
available by service providers on networks, but they are not compiled automatically by using 
web crawlers to scan HTML web pages posted on a network. For example. 

15 rrtto:/Avww.yellowpages.com.au and http://www.mcp.com provide classified advertisements of 
the Yellow Pages type with finks to the web pages of paying advertisers or subscribers. There 
are also directories of email addresses which approximate the White Pages directories, listing 
the names of individuals and organisations and contact detaas. (eg http7/www.bigbook.com 
and http://query1.wrKiwrH5re.corn). However, these emaB directories require Esters to manually 

20 add their directory entries and enquirers to be aware of and to find the Directory enquiry web 
page. They cannot be automatically generated by scanning web pages using web crawlers 
since there is no adequate mechanism to relate emaB addresses to the names of people and 
organisations and their other contact details which may also exist in the same web page. 

25 OBJECTIVES OF THE INVENTION 

The general object of the invention is to provide improved methods for automatically building 
searchable databases of classification, contact, and/or geographical information by using web 
crawlers to interrogate web pages posted on a network. JFor convenience, tho information is 
collectively referred to as CCG-data) 

30 

Other non-essential objectives are to provide methods for including and/or displaying CCG- 
data within web pages accessed by browsers, for automatically extracting CCG-data from web 
pages posted on a network and for using the same, and/or to provide methods for searching 
automatically compiled databases using such data 

35 

Another subsidiary objective of the invention is to provide a new form of web page which is 
better suited to the automatic compilation (using web crawlers) of databases constructed by 
the automatic scanning of many such pages posted on a network. 

40 OUTLINE OF THE INVENTION 

The invention is based upon the realisation that highly useful databases can be automatically 
built by successively interrogating web pages posted on a network if one or more HTML 
encoded CCG phrases are included in the web pages. A CCG phrase is one containing CCG- 
data in a form which is directly accessible and identifiable. CCG phrases may also include one 
45 or more items which provide the web page author with control over how the CCG-data is 
applied to the database. 

Data duplication can be reduced if some of the CCG-data in the coded CCG phrases can be 
displayed by browsers as well as being used to update databases. Errors due to inexactly 
50 duplicated data are also eliminated. Accordingly, it is envisaged that CCG phrases may include 
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one or more items which provide the web page author with control over how the CCG-data is 
displayed by a browser. 

HTML (including version 2 and version 3) and XML are evolving applications (sub-sets or 
5 dialects) of ISO Standard 8879 1985 known as Standard Generalised Markup Language 
(SGML). HTML, in large part is a language used to describe how text (unstructured data) and 
graphics is to be formatted for display. The HTML language consists of a finite number of 
"elements' (for example; "<BR>* where 'BR' is the element name, also called the tag name) 
which may contain "attributes' (for example; "<DL COMPACTS where "COMPACT* is an 

10 attribute named "COMPACT) and may contain values associated with attributes (for example; 
*<FONT S1ZE=+1>' where +1 is the attribute value of the attribute named "SIZE"). XML is a 
language used to describe structured data. The XML language is simflarty composed of 
elements, attrfeutes and values with a simflar syntax to HTML but unfflce HTML the element 
names which may be used are not restricted and the meaning of the XML data may be 

15 interpreted in any convenient manner. White the XML language is mute about how data 
described by XML is to be formatted for display, the data may be used by computer programs 
for any purpose including description of how XML coded data is displayed. However, due to its 
historic importance in connection with web pages, the term "HTML" is herein used to refer to all 
markup languages which are subsets or complete sets of the SGML language. In particular. 

20 the term "HTML encoded CCG phrase" and the synonymous term "CCG phrase* are herein 
used to refer to CCG-data encoded in a subset or complete set of the SGML language. 
Herein, a Veb page* is a document adapted to be or actually accessible through a network 
and encoded in a subset or complete set of the SGML language. 

25 For convenience, CCG items in HTML encoded CCG phrases, whether they are syntactically 
represented as elements or as aM)utes. will be referred to hereinafter as CCG attributes. 

A CCG phrase includes at least one of the following identifiable types of CCG-data attributes: 

• industry, product, service, and/or subject classifications, 

30 • contact categories, contact penson(s) and/or organtsation(s) names, titles or 
associations, contact detab including physical and postal addresses, telephone and 
fax numbers, email and Internet or network addresses or locations, public keys, and 

• geographic location details. 

35 A CCG phrase may also include any of the following identifiable types of CCG control 
attributes: 

• database control attributes to indicate which parts of the data are to be used to 
update databases, and 

• display control attributes to indicate how browsers are to display the data. 

40 

By virtue of occurring in the same CCG phrase, a plurafity of CCG-data attributes are 
associated with each other. 

By virtue of their occurrence in the same CCG phrase. CCG-data attributes are idententified as 
45 a set of associated attributes. However the degree of association between attributes can be 
controlled by the inclusion in the phrase of database control attributes 

The start and end of CCG phrases should be identifiable to dearty distinguish these phrases 
from other data. To identity the beginning and end of a CCG phrase, at least one HTML 
50 element should have a CCG specfe HTML element name or CCG specific attribute name or 
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CCG specific value. Each CCG attribute may consist with or without other incidental 
characters, of a CCG attrfoute name and/or a CCG value or values. Preferably, each CCG 
phrase is contained in the *<body>' of the web page. 

5 Two examples of a CCG specific HTML element are: *<CCG ...>" or - <CCG ... !> m or 
"<CCG> ..</CCG>". (Where a CCG phrase is coded in XML. the elements *<XML>' and 
*</XML>* may also be needed at the start and end of the CCG phrase.) A less satisfactory 
example is: -<!-CCG ...-> where the characters "CCG- after HTML comment element name 
'!-" are used to signify that the comment contains CCGnJata. An example of the use of a CCG 
10 specific attribute name is: *<START CCG>V/<END CCG> P . An example of the use of a CCG 
specific value is: "<START TYPE=CCG>'...-<END TYPE^CCG^, Obviously, other 
character strings could be substituted for the element name, element attribute name or 
element attribute value •CCG" string of the examples. 

15 The codes "<CCG ...>' and '<CCG ... !>' are compatible with most HTML apecmcaSons, but 
being non-standard HTML, most web browsers do not display any text or attributes (eg 
PQ=*AQDT within the angle brackets V and *>". These codes are preferred where display of 
the CCG data is not required and compatibility with older browsers is required (eg CCG 
phrases containing only classification values). 

20 

From one aspect, therefore, the invention comprises a web page for posting on a network, the 
web page being characterised by the inclusion of at least one CCG phrase in the "<body>* of 
the page, the CCG phrase being such that the CCG attributes contained therein are 
accessible and identifiable by (i) HTML compliant editors and/or (o) HTML compliant web 
25 crawlers for the automatic construction of databases of classified information, and/or (iii) HTML 
compliant browsers for display on the computer screens of network users. 

From another aspect, the invention comprises a method of constructing web pages of the 
above described type. The web pages may be constructed on digital computers using simple 

30 text editors such as Microsoft Windows Notepad, or preferably, purpose built human controlled 
editors or automated composing programs which embody knowledge of HTML and CCG 
syntax and grammar. Which ever process is used. CCG attributes are selected and inserted, 
modified, deleted and/or organised to form a vaBd CCG phrases in HTML encoded documents 
and the documents are posted on computer readable storage devices of computers connected 

35 to a computer network so that the documents are generaOy available to computers on the 
network. 

From another aspect the invention comprises a method of populating a database with CCG- 
data extracted from web pages. Web pages posted on a network are successively retrieved by 

40 a digital computer program (eg:, a web crawler) and CCG phrases contained therein are 
identified and at least some of the CCG attriutes found within the CCG phrases are extracted. 
The CCG attribute names are used to determine the type of data in the associated values. 
Generally the CCG attributes of interest are those relating to classification, contact and 
geographic data and database update controls while the attributes of Bttie or no of interest in 

45 relation to database updating are those relating to display controls. Of course, the CCGKJata 
extracted need only -be- -that relevant to the particular database being updated. For example, 
one database may have been designed to index only web page classifications and URLs while 
another database may have been designed to index only contact details. Databases also differ 
m thetr internal representation of data and means of associating data. For example some use 
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"flat file" tables, others use pointers to data to create network associations while others use 
hashing and buckets. 

The conventional nomenclature differs considerably between different types of database. 
5 Depending on the particular database nomenclature, data of the same type is said to be stored 
in table columns, fields, attributes and properties. The terms column and field are somewhat 
related to the physical representation of the data in files while attribute and property is more 
related to the logical representation of data. To avoid confusion, with the terms "HTML 
attribute*, "CCG attribute" or just 'attribute", hereinafter a database property means both a type 
10 of data stored in the database end a pSace in the database where data of the same type is 
stored. Database properties are referod to by a name f property name*) or similar reference 
and contain values. For example, a database property with the name "City name 0 and which 
contains values which are afl the names of cities may be defined as a "City n?me" type 
database property. 

15 

Whichever style of database is used, R is preferred that the database update program relate 
the CCG attributes to corresponding database properties used by the database update 
process so that the database properly values are updated with CCG values in a manner which 
preserves the distinctness, content and meaning of the CCG values and. preferably, preserves 
20 the CCG value associations expressed in the CCG phrase as sets of associated database 
property values of different types. 

In some cases, ft is desired to know the address of the web page from which the CCG values 
were extracted. For example, the purpose of building a database might be to allow searching 

25 of the database by web page dassfficstcon to provide a list URLs of web pages or URLs of 
portions of web pages which contain mstchsng CCG classifications. The URLs could then be 
inserted in an HTML document and transmitted to a web browser as a fist of references to web 
pages matching a search expression. In that example, associating the URL of a web page or 
the URL of a portion of a web page with the CCG values extracted from the same web page or 

30 web page portion is important and the URL or means of reconstructing it must be available and 
supplied to the database update process. In one style of database, the values of the same 
type are held separate rows in o cobimn (property) of a database table, and pointers held in 
another column (property) are associated wfih the values by sharing the same table row. The 
table row constitutes a set of associated property values. Each pointer points to a bucket 

35 (block of data) containing a fist of URLs or pointers to URLs held in a separate bucket or table. 
In another style of database, values of different types are held in different tables together with 
a set number, pointer or simSar cctS© ah&h & used to indicate whech values are associated as 
members of the same seL In one variation, the values of set members are prefixed with a code 
indicating the type of value and aS values are held in the same column of a table, rf the 

40 purpose of the database is to hold contact data, recording the web page URL in the database 
might not be required although rf the URL is not present in the database, updating changes in 
the CCG contact details contained within a web page is more difficult. Of course, one 
database may be used to record afl types of CCG values contained in web pages and 
associate with each other any and aS values extracted from the same web page or even from 

45 other web pages. 

From another aspect, the invention comprises a method of searching the databases 
constructed as outlined above. These databases may be used for a variety of searching 
purposes. For example, to find web page URLs by using the association of web page URLs 
50 with industry, service, product or subject classification or a person's or organisation's name or 
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address or geographic location values or any combination thereof. In another example the 
databases may be used to find the contact details for people or organisations by name or 
location of industry, service, product or web page subject type and so forth by using the 
assoc.at.on between items of the contact details, in the database without having to retrieve web 
5 pages associated with the contact detais. 

More particularly, the searching method involves finding URL references, or finding sets of 
associated database property values, from databases containing CCG-data The method 
nduding steps of parsing a query phrase received from a computer network to extract query 
10 relational expressions and. from each expression, deriving a query field name, query relafionaJ 
JT QUery value - detennining the type of tr,e query field by reference to its name 
relatag toe : query field to a corresponding database property according to type and locating 

w^!!^5t a8e PmPe ? V3lUe5 ta ^ totetes* Property which return a true value when 
tested against the query value using the query relational operator. Finally, the URL references 

varJSafe^a^^ a8S0Cia,ed ^ * e *° CCG-data database property 



Database queries are usually expressed in a query language in the form of a phrase or 
sentence, m query by example style enquiry systems, the user types values into input fields on 

20 a form and a program extracts the input values and uses the values to automatically compose 
a query phrase or sentence. There are many easting examples of query languages used in 
connection with databases. Generally, thsy consist of relational expressions (eg Field=Value) 
log.cal expressions and grouping of refatonal and logical expressions by means such as 

0 , ^ n8 ?Tf- ™? may also CO" 13 *" "ting and output formatting expressions. Often 

25 abbreviated I notabon is used in the eajswwons such as leaving out field names or relational 
operators which are then inferred from 8ho value En the expression or implied by default In an 
enquiry the nature and format of the ou&ul may also be implied, such as a list of URLs of web 
pages or a list of contact details. Whatawar ia the mechanism of any particular database the 
queryj^resaon needs to be parsed and fields in tha query expression, explicit, default. 

30 implied or inferred, need be related to database properties of similar type. In some styles of 
wtaoase enquiry the query expression is evaluated against each row of a table or record of a 
me to find rows or records fe a set off associated property values) which match the query 
egression, pother styles, sub-sets of to values of the properties are selected according to 
« ll" ^ 1 rational expressions in the query expression and the sub-sets are 
JD combtned according to logical and grouping expressions in the query to find the sets of 

™. ^ Crty V3lUeS * hteh *" W expression. Often, to make logical 

operator* which combine the selected sublets more efficient. r is not the values which are 
selected but pointers to the values (eg Table name and table row) or unique keys (eg URLs or 
*0 ! T !? S U) 88S0dated ^ example, the AND logical operator is often 

S 2!*™**° ^ so «hat only values or pointers or keys common to both Bets are 
found in the combined fist Usually, the query producea a resutt est which is then provided to 

2fJ!S!? SeS - FOr ^f mp!e ' 3 of URLs of web ^ges is processed to produce an 
atteadrvery formatted HTML encoded document containing the URLs and is sent to a web 

45 d^^l!?^" ,0 retri8Ve htBreSfinfl web P a B es h anotto example, the contact 

£ JS? *i " "* datab8Se ^ 88Ch or in *• are retrieved from 

P,B f At9d 38 8 rep0 ' t m the fonn «~ m "™- encoded document and is 
sent to a web browser for viewing. 

50 2? IT 6 ' IS?* "If lnVen6 ° n """P*® 8 3 method * ^ying CCG^ata contained in 
50 CCG phrases wrthm web pages which are displayed by a web browser executing on a digital 
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computer. While a web page is loading or has loaded in a web browser, the web browser 
parses the web page and tfsptays the text (or data) of the web page on a display device 
connected to the computer. When the web browser parser encounters CCG phrases, the web 
browser may display the CCG-data (element and/or attribute names (or translations of element 
5 and/or attribute names) and/or values) in a number of browser specific ways. For example, the 
web browser may by default not display any CCG-data. display all CCG-data. not display any 
CCG-data until a CCG display control attribute exp&dtty states that subsequent data should be 
displayed or display all CCG-data until a CCG display control attribute explicitly states that 
subsequent data should not be displayed. The web browser may also use CGA display 
1 0 controls specifying the size, font, position and so forth to alter the display of the CCG-data. 

DESCRIPTION OF EXAMPLES 

Having indicated the nature of the present invention, examples or embodiments thereof win 
now be described by way of illustration only. 

15 

Example 1 : HTML Syntax Suitable for Representing a CCG Phrase 

The foBowing is an example of HTML element syntax suitable for representing CCG phrases in 
which a control (e.g. 'SHOW - ) may be "good until countermanded* and thus apply to more 
than one field: 
20 <CCG HREF^urf 

{{NAME=Tabel* I ID=Tdentifier_codeT &\ {lANG^language^code" & 

CLASS=*Class name"} 

{ 

{SET_SEPARATOR}&| 
25 (INDEX | NO INDEX) &| 

{SHOW | HIDE} &{ 

{XPOS="honzontaI _j>osition_nurTiber*} &| 
{YPOS="vertical jraitionjiumbeO &| 
{NEWUNE}&|. 
30 {ALlGN=centre | left | right | justify} &i 

{SlZE=l+/-]1|2|3|4j5|6|7}&| 
{COLOR=-#nggbb- 1 'cokwjraner} &| 
{FACE=>pe_face_name - } &| 

{BUNK &| BOLD ^UNDERLINE &| ITALIC &| STRIKE} &| 
35 {SUBSCRIPT | SUPERSCRIPT} &| 

{CLEAR{=left| right |aB}} 
{NORMAL} &| 

{{{CONTACT &| COPYRIGHT &| DEVELOPER} &| 
{PERSONAL &| BUSINESS &| ASSOCIATION} &| 
40 {atthbute_name="atiribute_vatue{s)"} 
} 



where: the ellipsis implies optional repetition of the braced Cf T) items; the braces are 
45 used to group items and are not CCG syntactic elements; "& a (and) implies items must occur 
together, T (or) implies only one item must occur, and % &f .(and/or) implies any including none 
of the items may appear together. 

Using the syntax of this example, each CCG phrase is represented as an HTML element, the 
50 element name being "CCG* and the CCG-data (eg attribute jmme="attribute_value # ) and CCG 
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controls (eg SIZE=+1) are represented as attributes of the HTML element Some of the 
attributes (eg SIZE) having expficit values (eg +1) and some attributes have implied values 
depending on the presence or absence in a CCG phrase (eg when the attribute BUSINESS is 
present it has the implied value of True and the implied value of False when absent). 

5 

Representation in XML syntax requires, at most, only a simple translation. AO the items, such 
as 'NORMAL* and "attributejiame" may remain unchanged as attributes of the element 
named *CCG* (eg <CCG «e=+1/>>. However, when a CCG phrase is encoded in XML. it is 
preferred that the items are represented as XML elements. For example attribute *SIZ£=*T 
10 can be represented as element "<size>*1</size>" or -<size value=+1/>* and "NORMAL" can 
be represented as *<normafc>. 

In this example, the attributes. ID. LANG and CLASS take their meanings from HTML 3.0. The 
W in HREF="urf or may be a fink with or without destination anchor labels. For example the 

15 URL http7Avww.w3.org/docs.html does not contain a destination anchor label (or identifier) 
while http7Avww.w3.0rg/docsJJtml#searching does contain the destination anchor label 
•^searching" which is intended refer to an anchor in docs.html such as <A 
NAME="searching">...</A>. There b some confusion in various HTML standards 
documentation about the distinction between the expression NAME=1abel" and the expression 

20 ID="identifier_code'. For most practical purposes the two expressions have the same function 
or meaning: to uniquely identify within a document a position in or portion of that document 

Database control attributes: 

"Set_separatof* indicates the end of association between preceding and following data other 
25 than through the weaker mutual association with the same CCG phrase or web page; the data 
are divided into sets. Index | Nomdex" indicates that the following data are / are not to be 
indexed by a web crawler. These attributes have an implied attribute value of True* if present 
m and 'False' when absent from a CCG phrase. 

30 Display control attributes: 

"Show | Hide" indicates that a browser should show I not show the following data. Xpos and 
Ypos indicate the position (for example in pixel or physical units) on the browser screen where 
the data is to be displayed: "NewSne" may. be used in addition or as an alternative method of 
placing text on a browser screen. "AUgnT Indicates the positioning of data on a browser screen 

35 relative to the cursor, position set by 'Xpos'. "Ypos" or "Newtine". "Size". "Colour* and "Face" 
indicates the size, colour and type face or font of the following data when displayed on an 
browser screen. "Blink". "Bokf. Underine". "Italic*. "Strike". "Supersc/ipf and "Subscript" 
indicates that the following data should be displayed blinking, bold, underlined, italicised, struck 
through, superscripted or subscripted. "Clear" indicates that the browser screen in the region 

40 where data win be displayed should be cleared to background before displaying the following 

data. "Normal" indicates the data Is to be displayed without the "BDnk" "Clear" 

characteristics. The display controls which consist of an attribute name without an explicit value 
have an implied value of True* when present and 'False' when absent. 

45 CCG-data attributes: 

'Contact &| Copyright &| Developer" indicates that the following CCG-data refers to details for 
a person or organisation and/or to the copyright owner and/or to the HTML or web page 
deve toper. "Personal &| Business &| Association" indicates that the following data refers to 
details for a person and/or business and/or association. The previous CCG-data attributes 

50 have an implied attribute value of True' If present in a CCG phrase or set and 'False' when 
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absent from a CCG phrase or set The attribute_name could be standard CCG attribute names 
or synonyms of standard CCG attribute names or abbreviations of CCG attribute names which 
refer to the following types of CCG attribute values where square brackets T and T surround 
suggested attribute names: 
5 • industry or service or product or subject classifications and sub-classifications: 

• classification name [CNJ, 

• classification codes [CCJ. 

• display only text [TEXT]. 

• contact 
10 • person: 

- • courtesy title [PNC], 

• first given name PNG], 

• other given names [PNO], 

• family name [PNF], 
15 • name suffix (PNS] t 

« qualifications [PQj, 

• associations [PA], 

• contact person tWe [BT]. 

• contact person role [PR]. 
20 • organisation: 

• name [ON], 

• unit[OU], 

• identifier [OID]. 

• physical or post or delivery address: 

25 -type [AT] (= "PHYSICAL' &( "POST-OFFICE" &| 'POSTAL" &| 'DELIVERY") 

• post office box number [AP#] 

• post office name [APN] 

• room or suite or offioe or unit or flat or apartment name &| number [AB#], 

• floor name &| number [ABF], 
30 • budding name [ABN], 

• lane or street or road or highway number JAS#], 

• lane or street or road or highway name [ASN]. 

• suburb or town or city name [ACN], 

• region or state or territory or province name (ARM], 
35 • post code fAPC], 

• country or nation name [ANN], 

• telephone: 

• type [TT] (= "PREFERRED^ &| "VOICE* &| 'MOBILE" &| 'CAR" &| 'MESSAGE* 
ipPAGER* &| 'FACSIMILE* &| "MODEM" &| 'ISDN" &| "VIDEO") \ 

40 • nation or country code number [TC#], 

• trunk access number fTT#], 

• area code number [TA#]. 

• local number [TL#], 

• email: 

45 - type [ETJ(= 'INTERNET | {other)), 

• mailer[EM]. 

• address [EA], 

• Internet address: 

• uri[tURL]. 
50 • date & time: 
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• date & time from [DTFJ. 

• date & time to [OTTJ. 

• weekday from [DTVVF], 

• weekday to [DTWTJ. 

5 • weekday time from [DTWFTJ. 

• weekday time to (DTvVTTj, 

• time zone [DTZ]. 

• brand name [BN]. 

♦ pub&c key: 

10 ♦ key typepCT) 

• *ey[K). 
• geographical: 

• location units [GLUJ, 

• location [GL], 

15 • serviced region units [GLRU]. 

• serviced region [GLRJ. 

Suggested attribute name [CNJ is the name of an attribute associated with the attribute value 
containing 'classification name* type data. For example, the [CNJ attribute value could be the 

^Q name of a proprietary or national or international or other industry classification standard such 
as the Australian and New Zealand Standard Industry Classification or -ANZSICT for short or 
the U.S. Bureau of the Census Industrial Classifications (USBCIC). The associated 
classification codes fCC] attribute value could contain the codes and/or descriptions of the 
codes of the named standard with or without modifications, deletions or extensions For 

25 example: CN='AN2SICT CC^61;Road transport* or CN=USBCKT CC='581;Hardware store' 
Service classifications such as the International Standard Classification of Occupations could 
be used. For example: CN=-|SCO(T CC='4430;Auction8er Product classifications such as the 
™^ Commodity Description And Coding System could be used. For example: 

™ Cr : = ^ SC cc= "Ml1;Turbojets. turbo-propellers & other gas turbines; parts thereof For 

30 subject classifications. Dewey Decimal, and/or Universal Decimal and/or Library of Congress 
and/or Bliss and/or Colon Classification could be used. For example: CN="DDC 
CC=*577.699,Sea shore ecology* The inclusion of subject classifications provides a very 
simple, straightforward method of classifying the subject matter of an HTML document which 
could be attractive to commercially oriented copyright owners. 

35 

The text (TTEXT]), person OPNC] - [PR]), organisation ffON] - [OID]). physical or post or 
delivery address {[AT] - [ANN)), telephone CTTJ - UW. email address GET] - [EA]) and 
tatemet address [IURL] are intended to be associated with each other in the obvious manner. 
Date A time(s) QDTF] - (DTZJ) are intended to indicate the times at which the address and/or 
40 telephone and/or email will be serviced by the associated person(s) and/or organisation's). 
The brand name fJBN]) attribute is intended to hold commercial brand names. Public key flKT] 
- [K]) s intended to hold pub&c encryption keys for secure communication with the contact 
person or organisation. 

45 2!™?^™ k>Ca60n IGL1 nyM be a taWude ** tongue (eg 
E148D31 12.S-.S36D4ff.09.tr orE148.5201.S3S.6693 or -148.5201.-36.6693). ora Universal 
Gnd Reference (eg 55FV364402) or other global national, regional or local location reference 
wrth units as specified (GLUJ. which is typed in or obtained by pointing to a digitally encoded 
cn "Iff ° r 0,her meth0d3 h mow POP"*** regions of some countries such as the U.S street 
50 addresses and post codes are associated with a moderately accurate geographic location and 
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can be used to interpolate geographic location data where geographic location data is not 
expfcffly stated in the CCG-data. Using a universally recognised code such as latitude and 
longitude has advantages when used with international mediums like the Internet. 
Geographical location is intended to be associated with a post delivery address or physical 
5 address such as place of business or residence. A CCG compliant browser could use this 
reference to display a map centred on that geographic location. The purpose of the 
geographical location data is to aQow browser users to specify search engine search criteria 
which will result in the search engine selecting only those Internet access&le documents which 
provide details about providers which are within a specified region. The serviced region (GLR] 
10 is intended to indicate the preferred area of operation of providers expressed in terms of 
serviced region units fGLRU). A radial distance (eg in kilometres) or alternate means of 
expressing an area of interest around a geographic point such as polygons, are envisaged. 

It is envisaged that the CCG attribute_vakje could be composed of more than one value 
1 5 (actually sutnvalue) wherein specific characters or character strings separate individual values. 

While specific instances of element names and types have been given in this example, of 
more importance is the type of data and type controls over the display and indexing of the 
data. As an alternative to the preferred immediately following example where the CCG-data is 
20 lumped together under the HTML element named X:CG\ certain elements of the data, for 
example the classification data, could be lumped under separate HTML elements with 
distinctly different names thereby separating CCG classification data from CCG contact data. 
However, this is not preferred because the strength of association between the two types of 
data is weakened. 

25 

Example 2: Classification of Portion of a Web Page. 

Where it b desired to classify a portion of a web page, such as a paragraph about a product, 
simple CCG-data may be used in conjunction with the syntax of Examptel . For example: 
<A NAME=*Radios i> >AM-FM radio receivers: </A> 
30 <CCG HREF=*Radio$*> 

CN='ANZSKT 

CC="E23.34.78:EtecWcaJ equipment - radio receivers AM" 
CC-*E23.34.79;Electrical equipment - radio receivers FM" 
</CCG> 

35 We wont be beaten on the price of these high quality receivers .... 

In this example, the CCG prase appears after the related anchor (<A NAME=...</A>). 
However, while such proximity visually provides an obvious association between the anchor 
and related CCG phrase, it is intended that CCG phrase containing the attribute HREF related 
to a specific anchor could appear anywhere within the body of a web page and remain related 

40 to the named anchor. The CCG phrase containing the attribute HREF could appear in a 
separate document and thereby relate the CCG-data to the entire document or to a named 
anchor although, as previously noted, coordinating separata documents can be problematic. In 
the absence of the HREF and NAME attributes, His also intended that the CCG -data apply to 
the whole web page. 

45 

Example 3 Classification of Portion of a Web Page using XML Syntax 

Using XML syntax and similar attribute names to those of Example 2 the HTML fragment of 

Example 2 may be rewritten as: 

<A NAME=*Radio$*>AM-FM radio receivers: </A> 
50 <XML> 
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<CCG> 

<HREF>"#Radios"</HREF> 
<CN> - ANZSIC*</CN> 

<CC>"E23,34.78:Etectrical equipment - radio receivers AJUT</CC> 
5 <CC>*E23.34.79;Etectrica1 equipment - radio receivers FM"</CC> 

</CCG> 
</XML> 

We wont be beaten on the price of these high quality receivers .... 
This example demonstrates that the translation of CCG-data from HTML to XML (and the 
10 reverse) involves simple syntactical and grammatical translations. Of course, the resulting 
HTML and XML. while \veO formed* might not be recognised or, if recognised, might not be 
understood by some parsers. 

Example 4: Constructing a Web Page Containing CCG-data 

15 As an example, a web page developer, Alice Jarnieson. is preparing an advertisement for a 
local electrician John WSGams. trading as Kelso Electrical, who wants to advertise on the web 
for business within 30 kilometres from his office located at 18 Raglan Street Kelso. New South 
Wales. Alice uses a graphical user interface web page authoring tool capable of creating and 
modifying web pages containing HTML (and XML) CCG phrases by accepting inputs from a 

20 user. The tool executes on a digital computer having input devices such as a keyboard, 
mouse, fight pen and touch pad, display devices such as a CRT. LEO arrays, liquid crystal 
arrays and computer-readable media such as magnetic and optical disks, memory arrays, 
magnetic tape and the like. 

25 The authoring tool also embodies Knowledge of the content and structure of CCG phrases 
such as the attribute names, vaOd ranges and sets of associated attribute values, the normal 
order of the attributes in the CCG phrase and interdependences between attr&ute values. The 
tool provides a window where web pages may be viewed in layout (browser) mode and 
another window where the HTML code may be viewed in editing mode. The tool also provides 

30 means of inserting, deleting, modifying and organising HTML elements, changing font size, 
face and colour and so forth. The tool provides means for the user to build CCG phrases by 
using input devices to select an edit control representing various types of CCG attributes from 
a fist which the tool then Inserts in the body of a web page together with, when not already 
present HTML code indicative of the start and end of a CCG phrase. The user then types in 

35 the value in the attribute. Similarly, the tool provides means of converting web page text to 
CCG attributes. Using input devices, the user selects the text to be converted to a CCG 
attribute then selects an edit control from a Gst the too! then inserts the HTML code necessary 
to encode the text as a CCG attrfoute. However, these semknanual methods of creating and 
modifying CCG phrases are inefficient and error prone. The tool also provides a button, which 

40 can be activated by using input devices, for access to CCG phrase editing functions. The CCG 
editing functions consist of a means of extracting the CCG values from existing CCG phrases 
in the web page being edited, forms for entering and modifying the extracted CCG values, a 
layout view browser window for altering how the CCG^Jata displays (position, font size, face, 
colour, bold, normal, hiding or shewn) and so forth), a data view browser window to alter 

45 which CCG-data values are to be indexed or not indexed in search engine databases, and a 
means of deleting existing CCG phrases from web pages and inserting new or changed CCG 
phrases in web pages. Editing cursors marking the current location at which text and/or data 
may be inserted, deleted or modified are provided in each window and form. 
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In the current example, the web page initially contains no CCG phrase. Clicking the CCG 
editing function button of the authoring tool causes a form to appear. The form contains 
prompts related to CCG attribute names and associated data input fiefds related to the CCG 
attribute values associated with the CCG attribute names, that is CCG-data. The fields are 
5 blank because, in the web page layout view, the edit cursor is not over a CCG phrase (and can 
not be since the web page tnitiaOy contains no CCG phrase). The service dassrftcations 
relevant to the web age, John Williams physical business contact address, phone and fax 
numbers, email address and geographic location and his post office business contact 
addresses are entered into the forms usoig a keyboard and mouse. The developer, Alice 

10 Jamieson, also includes her basic contact deta3s where provided for on the form. The forms 
use drop down lists to select address blocks (eg physical and post office) for editing. Logic 
associated with the forms validates the CCG attribute values and interdependences. Input 
devices are then used to control the CCG-data layout view browser to modify the appearance 
of the CCG-data such as font size and colour and positioning. In the layout browser, input 

15 devices communicating with the edit cursor are used to highlight individual items and blocks of 
items to be changed. The post office address *s highlighted as a block and moved into position 
in line with the physical address. The CCG-data view window is then used to check which data 
items are to be indexed by search engines. In this example aB CCG-data (ie all CCG attribute 
values except display control values and database control values) are to be indexed. Input 

20 devices are used to control the edit cursor to highlight the entire data and a mouse is used to 
click (activate) a button to mark all the data for indexing. Then another button is clicked which 
builds an HML encoded CCG phrase of CCG attributes derived from the CCG-data values, 
display control values and database control values and inserts the CCG phrase in the web 
page at the location pointed to in the web page layout browser window. 

25 

The HTML code editing mode window was called up which revealed the following HTML 
encoded CCG phrase in the web page: 
<XML> 
<CCG> 
30 <INDEXA> 
<H\DE/> 

<CN>ANZSIC</CN> 

<CC>D36.1 1.45;Electrical contractors - residential</CC> 

<COD36.1 1 .46;Electrical contractors - industrial</CC> 
35 <SHOW/> 

<CONTACT/> <COPYRIGHT/> 

<BUSINESS/> 

<XPOS>50</XPOS> 

<YPOS>320</YPOS> 
40 <ALIGN>centre</ALIGN> 

<SIZE>3</SIZE>. 

<COLOR>biack</COLOR> 

<FACE>Times New Roman </FAC£> 

<BOLOA> 

45 <CLEAR>afl</CLEAR> 

<TEXT>Contact :</TEXT> 

<PNOMr</PNC> 

<PNG>John<VPNG> 

<PNF>Wflliams</PNF> 
50 <PQ>AIE</PQ> 
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<PA>ARUC</PA> 
<NEWUNE/> 

<PT>Managfng D*mector</pT> 
<NEWLINE/> 

<ON>Kelso Electrical Pty. Ud.</ON> 
<NEWLINE/> 

<NORMAL/> <rTALIO v 
<SlZE>-2</SlZE> 

<TEXT>NS W License 45678C</TEXT> 

<NEWLINE/> 

<NORMAL/> <BOLD/> 

<S1ZE>*2</S1ZE> 

<AT>PHYS1CAL</AT> 

<AS#>18<AS#> 

<ASN>Raglan Street<ASN> 

<NEWUNE/> 

<ACN>Kelso</CAN> 

<NEWLINE/> 

<ARN>NSW<ARN> 

<NEWLINE/> 

<HIDE/> 

<ANN>AustraBa</ANN> 

<NEWLINE/> 

<SHOW/* 

<TEXT>Phone:</TEXT> 

<TT>PREFERRED ; VOfCE ; MESSAGE</TT> 

<HIDE/> 

<TC#>61</TC> 

<SHOW/> 

<TT#>0</TT#> 

<TA#>63</TA#> 

<TL#>45&-7828<mj> 

<TEXT> Fax:</TEXT> 

<TT>FACSIMILE</TT> 

<HIDEA> 

<Tc#>6i<nrc#> 

<SHOW/> 

<TT#>CXrT#> 

<TA#>63</TA#> 

<TL#>456«7829<TL#> 

<NEWUNE/> 

<6T>INTERNET</ET> 

<EA>johnw@firefty.com.au<EA> 

<TEXT> </TEXT> 

<GLU>LatLong</GLU> 

<GL>-"33.3978S:148.5679E</GL> 
<GLRU>Km</GLRU> 
<GLR>30 </GLR> 
<SET_SEPARATOR/> 
<XPOS>250c«POS> 
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<ypos>32o<nrpos> 

<NEWUNE/> 
<NEWLINB> 

<TEXT>Or write to us at :</TEXT> 
5 <NEWUNE/> 

<ON>Kelso Electrical Pty. Ltd.</ON> 

<NEWLINE/> 

<AT>PQST-OFF!CE</AT> 

<AP#>P.O. Box 187</AP#> 
10 <NEWL!NE/> 

<APN>Sunny Comer</APN> 

<TEXT> </T£XT> 

<AP02795</APO 

<NEWUNE/> 
15 <HIDE/> 

<ANN>Austrafia</ANN> 

<SETSEPARATOR/> 

<HIDE/> 

<DEVELOPER/> 
20 <BUSINESS/> 

<PNG>Afice</PNG> 
<PNF>Jamieson</PNF> 
<ET>INTERNET</ET> 
<EA>alyam@firefly.com.au</EA> 
25 <IURL>http7/www,f(refly.com.au/'a!jarn/<IURL> 
</CCG> 
</XML> 

in the web page layout browser window the CCG-data displayed as follows: 
30 Contact ; Or write to us at 

Mr John Williams. AlE. ARUC. 
Managing Director 

Kelso Electrical Pty. Ltd. Kelso Electrical Pty Ltd 

NSW License 45678C P.O. Box 187 

35 1 8 Raglan Street Sunny Comer 2795 

Kelso 
NSW 

Phone:063-456-7828 Fax:063-456-7829 
Email: johnw@firefly.com.au Map 

40 

Having encoded the web page in this way, Alice then posts it on the storage device of a digital 
computer connected to the Internet from where it can be retrieved through the Internet using 
the URL T)ttp://www.firefly.com.auH^ 

45 Example 4: Constructing a Database from Web Pages Containing CCG-data 

During a routine sweep of Internet connected web page servers, a web crawler (or robot) 
operating on a server named 'ccg.seafch.com* executing on an Internet connected digital 
computer discovers the URL ^ttp7A*^.firef!y.com,au/Hohnw/index.htmr in a document it 
had previously retrieved through the Internet The web crawler decides that the URL matches 

50 it's selection criteria because the URL contains the suffix " html". The web crawler then 
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successfully retrieves the document by extracting from the URL the address of the computer 
hosting the document addressing and sending a message (including the address of the web 
crawler) requesting the web page through the network to the web page host computer using 
TCP/IP protocol, the host computer then reads the document addresses and sends the 
5 document to the web crawler using TCP/IP protocol, the web crawler then waiting until it has 
received all parts of the web page from the host computer before proceeding, it inspects the 
contents of the document and finds that it matches the additional selection criteria that it is an 
HTML encoded document The web crawler program, depending on its state and logic then 
parses the document strips out and saves some or an of the URLs in the document for future 
10 examination. The web crawler program then passes the document together with the URL of 
the document through a network communications channel to an indexing program executing 
on a different computer. The indexing computer has database updating software which 
manipulates a database stored on computer-readable media. 

1 5 The indexing program parses the document from first to last character, indexing some of the 
meta data in the <head> of the document and the words in the text of the document with 
respect to the document URL In the database of this example, unique words extracted from 
the documents already indexed are held in separate rows of a column of a database table and 
in another column of the same table on each row is an associated pointer to the first bucket or 

20 block of URLs of documents containing the word associated with the pointer. As new words 
are found, the new word is added as a new row in the word column of the table, a new bucket 
is created, the URL of the document containing the new word is inserted mto the bucket and a 
pointer to the new bucket is written in the new row pointer column. When the same word is 
found in another document the row in the table of the wort is found, the pointer is retrieved 

25 from the table, the bucket pointed to by the pointer is retrieved and the URL of the other 
document is inserted in the bucket Where a bucket becomes ful of URLs, a new bucket is 
created and a pointer to the new budtat tor holding additional URLs is placed in the fuO bucket. 
Deletion of words and URLs of changed or no longer existing documents is also provided for. 

30 In addition to indexing words extracted from Ihe text of the document the indexing program 
also indexes the CCG-data m the document as weO as indexing words found in the CCG-data 
When the parser finds HTML etemestf '<XML>' in the document it switches into XML parsing 
mode and switches out of that mode whan °</XML> is found. When the element *<CCG>" is 
found, the parser switches into the CCG parsing mode and switches out of that mode when 

35 *</CCG>* is found. 

The example database has a CCG-data attribute name to database property name 
correspondence table to show the relationship between the CCG-data attribute names and the 
database tables and columns (properties) where the CCG-data attribute values are to be 
40 stored in the database as database property values. The database property values and 
associated URLs are stored fn much the same way as for words extracted from text as 
outlined above. However. CCG contact data, for example, which consists of several distinct 
CCG-data attributes which are related (eg street name. city), is stored in a database table 
having a column (property) related to each distinct CCG contact attribute name and each 

!f P . 3 ™ C ? G " ntaCt d3ta {Gg pS ^ OO S name - ad *««. telephone number) as separated 
by <CCG> , <SET_SEPARATOR>° and '<CCG>- is held in a separate row in the table. The 
values stored in each row are considered to be a set of associated property values of different 
types. 
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The indexing program, during parsing the document of Example 2 above, encounters the 
"<CCG>" element and enters the CCG parsing mode. The parser knows to ignore display 
control attributes and to consider database control elements in the CCG phrase. The example 
indexing program opts to index all other CCG-data contained in the attribute vatues until 
5 explicitly Instructed not to index the attribute vatues by encountering the °<NOlNDEX/>° 
database control element and then to recommence indexing when the "<INDEX/>° database 
control element is encountered. 

Taking each CCG-data attribute name and associated attribute value(s) in succession, the 

10 example indexing program uses the correspondence table to translate the CCG-data attribute 
name to the database table and column (property) names where the CCG-data attribute 
value(s) are to be stored as database property value(s). The indexing program may opt to 
translate the CCG-data attribute values to database property values by. for. example, 
converting character strings of digits to binary encoded decimal representation, the string 

15 True* to a single bit representation and the Eke. The indexing program then adds or updates 
the database property valued), using the database tabte and column (property) names (or 
simitar references) obtained by translation, in much the same manner as outlined above for the 
update of the database using words Extracted from the document text, including associating 
the data to the document URL where desired. Where the CCG-date contains a °HREF 

20 attribute (or similar), the URL associated with the other CCG-data is a URL taken from the 
"HREF attribute value or composed of the document URL and the 'HREF attribute value if 
the attribute value is a partial or relative URL Some CCG attributes, such as "<BUSUMESS/> 
have only an implead value of true the attribute is present and falsa if the attribute is absent 
the °<SET_SEPARATOR/> D . Q <CCG> D and a </CCG>' resetting such values to false. However. 

25 where attribute value(s) associated with different attribute names are still related, such as a 
person's name and a street name, the related values of different types are stored on the same 
row of the same database table but cn a different column (database property) to preserve the 
relationship. °<SET_SEPARATOR/> a BmSs the degree of relatednese between, for example, a 
person's name occurring before the separator and a street name occurring after the separator. 

30 Using the example document and using the same database column (property) names as used 
for the CCG<lata attribute names a portion of the table constructed database table would look 
like: 





PNC 


PNG 


PMF 


PQ 


PA 


PT 




URL 






















Mr 


Jcrm 


WBGsns 


A1E 


ARUC 


Afenaging Director 




(pointer) 





















35 Difficulties not highlighted by this example are the need to handle properties having multiple 
values of the same type, Sparse rows* where only a few values are not null (blank) and tables 
with extremely large numbers of rows. For example, the CCG-data of this example could have 
contained multiple values of personal qualifications f PQ 0 ). To represent this type of data using 
a 2 dimensk>nal tabte database system, the database would be 'normalised 0 so that the 

40 multiple values were stored h a separata tabte and keys or pointers were used to relate the 
relate the items in the two tables. Numerous aftemate database systems, for example those 
based on key hashing and data twcksfc, or tagging data values with prefixes or suffixes 
related to the type of data value may be used. Preferably, however, whatever database 
system is uesd, it should preserve tho associations of CCG-data items present in the CCG 

45 phrases. 
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Because the geographic location data was missing from the postal address of the CCG-data in 
the example document, but a post code was present the indexing program inferred the 
geographic location from the post code. 

5 

Example 6: Finding Web Page References Using a CCG Database 

As an example, Kevin Robson lives in Sydney but owns and has rented out a house in 
Bathurst. He wants to use the web to find some electricians based in the general Bathurst 
region (not only in Bathurst City) to contact for estimating the cost of modifying the wiring in the 
10 house. He uses his web browser to open the web page 
"http:/Avww.au8(ine.com.auNveb_search.htmr containing AusUne's search engine web page 
search criteria input form encoded using the HTML *<form>" element 

The search criteria input form contains several input fields including those labelled 'Service 
15 classification", 'Key words'. 'City/Suburb/Town\ 'Country". *Lat/Long' and 'Radius*. The form 
also displays a button labelled "Map" to allow latitude and longitude to be selected by pointing 
to map images. The word 'electrician" is typed into the "Service classification* field, 'house 
wiring' into the 'Keywords' field. "BathursT into the Xity/Suburb/Town* field and *10' into the 
field 'Radius'. The country "Australia* was already shewing in the country field because the 
20 web page server had received cookie data from the browser indicating that that was the 
country used when the browser last used the web page. The 'submit search' button on the 
web page was dieted. The browser transmitted a message using TCP/IP protocol to the 
AusLine server containing the Input field values encoded in the header of the message. 

25 After a short delay, the search result HTML encoded web page was returned. Clicking on the 
•Service classification* input field drop down list box to check the classifications used in the 
search revealed three items: 

• Electrical contractors - residential 

• Electrical contractors - industrial 
30 • Electrical engineers 

The search engine attached to the server obtained those classifications by using word 
stemming and searching the text of the service classifications held in it's database. The 
Lat/Long field contained the value "33.3856S;148.5743E* which the search engine obtained 
by looking up the latitude and longitude of the town *Bathursr in the country 'Australia* cn it's 
35 database. Clicking on the 'Map* button retrieved a web page having the image of a map 
centred on the town of Bathurst and showing the area 20 Km around it The search engine 
obtained the map by making a request to another Internet connected server and supplying the 
latitude, longitude and radius. GficWng on the browser *Back" button returned to the search 
results page. 

40 

The search results contained 8 titles, brief descriptions and URLs including a reference 
containing the URL # http^A«^.fire^.com.au/--johnw/fftdex.htmr. Retrieving each in turn 
revealed that all were well focused accordirig to the search criteria being related to electricians, 
electrical contractors and engineers in the Bathurst area. The search engine obtained these 
45 references to web pages by: 

• searching it's database of service classification titles with words stemming from 
'electrician* which resulted in three service classification codes, 

• searching it's database using the three service classification codes to obtain an 
intermediate list of URLs of web pages containing those CCG codes 
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• searching if s database for the two keywords to obtain an intermediate list of URLs of 
web pages containing those words in the web page text. 

• Searching rfs database to find the latitude and longitude of Bathurst, Australia. 

• searching it's database to obtain an intermediate fist of web pages which contain 
5 latitude and longitude data tying within 10 Km of the latitude and longitude of 

Bathurst, Australia, 

• producing as a result fist, a fist of URLs which are common to all the intermediate lists. 

• obtaining from if s database the title and brief description of the web pages, 

• formatting the titles, descriptions and URLs into an HTML encoded report 
10 • transmitting the report to the enquiring web browser. 

Example 7: Finding Contact Details Using a CCG Database 

As an example. Jim Jones of Jones and Sons wants to send a recall notice about a faulty 
batch of UV stabilised electrical power cable to aO Electrical contractors and Electrical 
1 5 wholesalers in Australia who have email addresses. He uses his web browser to open the web 
page littp7AArww.ausfine.com.au/ooiUactjsearch.htmr containing AusLine's search engine 
contact search criteria input form encoded using the HTML "<form>* element 



The search criteria input form contains several input fields including those labelled 'Service 
20 classification", "Country" and "Output format". The word 'electric"* is typed into the 'Service 
classification" field, the word "Austrafef is typed into the "Country" field and the Tabular - 
Name & Email" option in the "Output fbrmaf drop down list box is selected. The "Submit 
search* button on the web page is clicked. The browser transmits a message using TCP/IP 
protocol to the AusLme server containing the input field values encoded in the header of the 
25 message. 

After a short delay, the search result HTML encoded web page is returned. Clicking on the 
"Service classification - input field drop down list box to check the classifications used in the 
search revealed too many classifications for the result to be sufficiently focused. The following 
30 four classifications were selected from the Est 

• Electric cable - ducting systems 

• Electrical contractors - residential 

• Electrical contractors - industrial 

• Electrical wholesalers 

35 and the "Submit search* button is pressed again to refine the search. 

The search results contained 3.473 names and associated email addresses and URLs to full 
contact details. Jim saved the search result page on his computer so that he could use his 
email program to send the recaB notice to each emafl address in the list. The email address 
40 "johnw@firefly.com.au" was included in the fist 

The search engine obtained these references to web pages by: 

• searching if s database using the four service classification tides which resulted in four 
service classification codes, 

45 • searching its database using the four service classification codes to obtain an 
intermediate fist of database primary keys of database table rows containing those 
service classification codes in the database Service classification attribute. 

• searching Ks database using the country name "Australia" to obtain an intermediate 
list of database primary keys of database table rows containing that word in the 

50 database Country attribute. 
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• producing as a resuft fist a Est of database primary keys which are common to both 
the intermediate fists. 

• obtaining from rfs database using the resuft fist the values of the name and email 
attributes, 

5 • using the HTML <table> element to format the name values, email values and full 
detail URLs into an HTML encoded report 

• transmitting the report to the enquiring web browser. 

This example relates to finding sets of associated database contact values without requiring 
10 references to web pages. However, finding other sets of associated database values such as 
sets of associated industry classification values and geographic location values might also be 
useful for some purposes. 

Thus it is appreciated that the afore stated goals, advantages and objectives are achieved by 
15 the teachings herein. In particular it is seen that, unfike the prior art, efficiently searchable 
Yellow pages and White pages databases and the fike may be automatically constructed from 
HTML encoded web pages. Additionally the database entries may be automatically linked to 
specific web pages and portions of web pages allowing convenient methods of indexing of 
product and service catalogues and the fike. It is also appreciated that simpler methods of 
20 constructing databases suited to a variety of other uses such as industry and subject 
directories are also provided. 

From the foregoing teachings and with the knowledge of those skilled in the art. it is apparent 
that other modifications and adaptations of the invention wiH become apparent For example. 
25 the method steps disclosed and claimed herein may be practiced in a variety of different 
orders. CCG-data may take on a variety of different forms within the meaning of the daims. 
Thus, It is our Intention to include within the scope of the datms not only the invention literaRy 
embraced by the language of the datms but to include afl such modifications and adaptations 
which may come to those skilled in the art 
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What 1 daim is; 

'\. An HTML encoded web page embodied on a computer-readable medium, said web 
page comprising at least one HTML encoded CCG phrase, each CCG phrase 
5 comprising: 

a) HTML code indicative of the start of a CCG phrase, 

b) at least one CCG-data attrfoute, and 

c) HTML code indicative of the end of a CCG phrase. 

10 2. An HTML encoded web page embodied on a computer-readable medium, said web 
page comprising at least one HTML encoded CCG phrase, each CCG phrase 
comprising: 

a) HTML code indicative of the start of a CCG phrase. 

b) at least two CCG-data attributes, 

15 c) at feast one database control attribute separating said CCG-data attributes into at 
least two sets of CCG attributes, and 

d) HTML code indicative of the end of a CCG phrase. 

3. An HTML encoded web page embodied on a computer-readable medium, said web 
20 page comprising at least one HTML encoded CCG phrase, each CCG phrase 

comprising: 

a) HTML code indicative of the start of a CCG phrase, 

b) at least one CCG-data attributes, 

c) at least one attrtoute of. database control attributes, display control attributes: and 
25 d) HTML code indicative of the end of a CCG phrase. 

4. A computer implemented method of building a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of: 

a) displaying a web page on a computer display device, 
30 b) displaying an edit cursor Indicating a character position on said display device and 
a corresponding character position in said web page, said edit cursor being 
positionable within the display of said web page by use of computer input devices, 
c) separately displaying on said computer display device a set of edit controls 
representing CCG-data attrfcute types. 
35 d) positioning said edit cursor within said display of said web page using said input 
devices, 

e) selecting an edit control from said set of edit controls using sard input devices. 

f) relating said selected edit control to a corresponding CCG-data attribute name, 

g) constructing a CCG-data attribute character string comprising a character string 
40 representing said attribute name and another character string representing an 

empty CCG-data value, 

h) if the said edit cursor Is positioned outside a CCG phrase. 

i) inserting into said web page, at the character position indicated by sard edit 
cursor, a start character string comprising HTML code indicative of the start 
45 of a CCG phrase. 

tl) inserting into said web page, immediately after the end of said start 
character string, an end character string comprising HTML code indicative of 
the end of a CCG phrase, and 

in) positioning said edit cursor between said start and end character strings. 
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i) inserting said CCG-data attribute character string into said web page at the 

character position indicated by said edit cursor, 
j) positioning said edit cursor at the character position in said web page of the CCG- 

data value of said inserted CCG-data attribute character string, 
k) inputting characters using a keyboard, 

D inserting said input characters into said web page at the character position 
indicated by said edit cursor, thereby converting said empty CCG-data value to a 
non-empty CCG-data value* and 

m) writing said web page on computer-readable media. 

A computer implemented method of buBding a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of: 

a) displaying a web page on a computer display device. 

b) displaying a start edit cursor and an end edit cursor on said display device, each 
said edit cursors indicating a character position on said display device and a 
corresponding character position in said web page, said edit cursors being 
positionable within the display of said web page by use of computer input devices. 

c) separately displaying on said computer display device a set of edit controls 
representing CCG-data attribute types, 

d) selecting a string of web page characters on said display device using said input 
devices to position said start edit cursor to indicate the start said string of web 
page characters and said end edit cursor to indicate the end of said string of web 
page characters, 

e) selecting an edit control from said set of edit controls using said input devices, 

f) relating said selected CCG-data control to a corresponding CCG-data attribute 
name, 

g) constructing a CCG-data attrfcute character string comprising a character string 
representing said attrfoute name and another character string representing a CCG- 
data value containing said string of web page characters, 

h) deleting said string of web page characters from said wen page, 
0 if the said start edit cursor b positioned outside a CCG phrase, 

i) inserting into said web page, at the character position indicated by said start 
edit cursor, a start character string comprising HTML code indicative of the 
start of a CCG phrase, 

ii) inserting into sakf web page, immediately after the end of said start 
character string, an end character string comprising HTML code indicative of 
the end of a CCG phrase, and 

00 positioning said start edit cursor between said start and end character 
strings, 

j) inserting said CCG-data attribute character string into said web page at the 
character position indicated by said start edit cursor, thereby converting said string 
of web page characters to a CCG-data attribute value contained within a CCG- 
data attribute contained within CCG-phrase. and 

k) writing said web page on computer-readable media. 

A computer implemented method of building a web page comprising at least one HTML 
encoded CCG phrase, the method comprising the steps of: 

a) displaying a CCG-data input form on a computer display device. 

b) inputting CCG-data values Into fields of said data input form using computer input 
devices. 
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c) inserting into the body of a web page a start character string comprising HTML 
code indicative of the start of a CCG phrase. 

d) inserting into said web page body immediately after the end of said start character 
string an end character string comprising HTML code indicative of the end of a 

5 CCG phrase. ^ _ _ . 

e) extracting successive field values from said data entry form together with related 

freld value type information. „_ . 

f) relating the type of each extracted field value to a corresponding CCfXlata 
attribute name. . 

10 g) constructing a CCG-data attribute character string comprising a character string 
representing said attribute name and another character string representing said 
field value. ! 
h) inserting said CCG-data attribute character string into said web page between saia 
start and end character strings. 

15 i) writing said web page on computer-readable media. 

7 A computer implemented method of building a database which comprises sets of 
associated property values wherein each set includes at least two property values of 
different types, the property values being any of classification values, contact values. 
20 geographic location values, hereinafter collectively referred to as CCG-data. the method 
comprising the steps of: 

a) retrieving successive web pages from a computer network, each web page being 
identified by a URL, 

b) searching each web page for a CCG phrase that includes a plurality of different 
25 types of CCG-data attributes, 

c) extracting a pluraOty of said attributes from said phrase, 

d) from each extracted attribute, deriving an attribute name and a related attribute 

value * 

e) determining the type of said extracted attribute and said attribute value by 

30 reference to said attribute name, 

f) relating said type of attribute value so determined to a corresponding type or 
database property value, 

g) relating the URL of said web page to an other type of database property value. 

h) writing said derived attribute value to the database properly value of said 
35 determined corresponding type in a set of associated property values, and 

i) writing the URL of said web page to a database property value of said other type 
in said set of associated property values. 

B A computer implemented method of building a database which comprises sets of 
40 associated property values wherein each set includes at least two property values of 
different types, the property values being any of classification values, contact values, 
geographic location values, hereinafter collectively referred to as CCG-data. the method 
comprising the steps of: 

a) retrieving successive web pages from a computer network, each web page being 
45 identified by a URL. 

b) searching each web page for a CCG phrase that includes at least one type of 
CCG-data attribute. 

c) extracting at least one said attribute from said phrase. 

d) from each extracted attribute, deriving an attribute name and a related attribute 
50 value. 
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e) determining the type of said extracted attrfeute and said attribute value by 
reference to said attribute name. 

f) relating said type of attribute value so determined to a corresponding type of 
database property value, 

g) relating the URL of said web page to an other type of database property value. 

h) writing said derived attribute value to the database property value of said 
determined corresponding type in a sat of associated property values, and 

i) writing (he URL of said web page to a database property value of said other type 
in said set of associated property values. 

A computer implemented method of building a database which comprises sets of 
associated property values wherein each set includes at least two property values of 
different types, the property values being any of classification values, contact values, 
geographic location values, hereinafter collectively referred to as CCG-data. the method 
comprising the steps of. 

a) retrieving successive web pages from a computer network. 

b) searching each web page for a CCG phrase that includes a plurality of different 
types of CCG-data attributes. 

c) extracting a plurality of said attributes from said phrase, 

d) from each extracted attribute, deriving an attribute name and a related attribute 
value, 

e) determining the type of said extracted attribute and said attribute value by 
reference to said attribute name, 

f) relating said type of attribute value 60 determined to a corresponding type of 
database property value, and 

g) writing said derived attribute value to the database property value of said 
determined corresponding type in a set of associated property values. 

A computer implemented method of finding references to web pages posted on 
computer network the method using a database comprising sets of associated property 
values, the property values being any of classification values, contact values, geographic 
location values, hereinafter cofec&vely referred to as CCG-data, and URL references, 
the method comprising the steps of: 

a) receiving a query phrase including query relational expressions from a computer 
network, 

b) parsing said query phrase and extracting each of said query relational expressions 
included therein, 

c) from each extracted query relational expression, deriving a query field hame. 

d) determining the type of said query relational expression by reference to its derived 
query field name, 

e) relating said type of query relational expression so determined to one of the 
following query relational expression types: CCG-data type, other type, 

f) provided said query relational expression is a CCG-data type, deriving a query 
relational operator and query value related to its query field name from said query 
relational expression, 

g) determining the type of said query value by reference to said query field name,- 

h) relating said type of query value so determined to a corresponding type of 
database property value, 
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i) locating database property values of said determined corresponding type which 
return a true value when tested against said query value using said query 
relational operator, . . 

j) extracting from said database a fist of the URL references associated with the so 
located database property values, 

A computer implemented method of finding sets of associated database property values 
the method using a database comprising sets of associated property values wherein 
each set includes at least two property values of different types, the property values 
being any of classification values, contact values, geographic values, hereinafter 
coDectivefy referred to as CCG-data, the method comprising the steps of: 

a) receiving a query phrase including query relational expressions from a computer 
network, 

b) parsing said query phrase and extracting each of said query relational expressions 
included therein. 

c) from each extracted query relational expression, deriving a query field name, 

d) determining the type of said query relational expression by reference to its derived 
query field name. 

e) relating said type of query relational expression so determined to one of the 
following query relational expression types: CCG-data type, other type, 

0 provided said query relational expression is a CCG-data type, deriving a query 
relational operator and query value related to its query field name from said query 
relational expression, 

g) determining the type of said query value by reference to said query field name, 

h) relating said type of query value so determined to a corresponding type of 
database property value. 

i) locating database property values of said determined corresponding type which 
return a true value when tested against said query value using said query 
relational operator, 

j) extracting from said database sets of associated database property values 
associated with the so located database property values. 

A method of displaying a web page comprising at least one HTML encoded CCG 
phrase, the method comprising the steps of 

a) retrieving a web page from a computer network. 

b) parsing said retrieved web page to locate an HTML code indicative of the start of a 
CCG phrase, 

c) parsing said located CCG phrase and extracting successive CCG attributes 
contained therein untfl an HTML code indicative of the end of said CCG phrase is 
found, 

d) from each extracted attribute, deriving an attribute name, 

e) determining the type of said extracted attribute by reference to its derived attribute 
name, 

0 relating said type of attrftute so determined to one of the following attribute types: 
database control, display control, CCG-data, 

g) provided said extracted attribute is not a database control type, deriving ah 
attribute value related to its attrfcute name from said extracted attribute, 

h) determining the type of said attribute value by reference to said attribute name. 

/) relating saW type of attribute value so determined to a corresponding type of 
parameter of a display-device-controP-program. 
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j) writing said attrfeute value to said parameter, and 

k) where said type of attribute is a CCG-data type, causing said display-device- 
controt-program to effect display of said. attribute value on a display device, 
formatted and positioned according said display-device-controJ-program 
5 parameters whereby successive values of CCG-data of the CCG phrase are 

displayed. 
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ABSTRACT 

A system for automatically creating databases containing industry, service, product and 
subject classification data, contact data, geographic location data (CCG-data) and Onlcs to web 
pages from HTML. XML or SGML encoded web pages posted on computer networks such as 
5 the Internet or Intranets. The web pages containing HTML. XML or SGML encoded CCG-data, 
database update controls and web browser display controls are created and modified by using 
simple text editors, HTML. XML or SGML editors or purpose built editors. The CCG databases 
may be searched for references (URLs) to web pages by use of enquiries which reference one 
or more of the items of the CCG-data. Alternatively, enquiries referencing the CCG-data in the 
10 databases may supply contact data wflhout web page references. Data dupfication and 
coordination is reduced by including in the web page CCG-data display controls which are 
used by web browsers to format for display the same data that is used to automatically update 
the databases. 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 
P^FADED TEXT OR DRAWING 
□BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 



□ GRAY SCALE DOCUMENTS 

□ line: 



LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



